186 research outputs found

    Query Expansion of Zero-Hit Subject Searches: Using a Thesaurus in Conjunction with NLP Techniques

    Get PDF
    The focus of our study is zero-hit queries in keyword subject searches and the effort of increasing recall in these cases by reformulating and, then, expanding the initial queries using an external source of knowledge, namely a thesaurus. To this end, the objectives of this study are twofold. First, we perform the mapping of query terms to the thesaurus terms. Second, we use the matched terms to expand the user’s initial query by taking advantage of the thesaurus relations and implementing natural language processing (NLP) techniques. We report on the overall procedure and elaborate on key points and considerations of each step of the process

    Evaluating the application of semantic inferencing rules to image annotation

    Get PDF
    Semantic annotation of digital objects within large multimedia collections is a difficult and challenging task. We describe a method for semi-automatic annotation of images and apply it to and evaluate it on images of pancreatic cells. By comparing the performance of this approach in the pancreatic cell domain with previous results in the fuel cell domain, we aim to determine characteristics of a domain which indicate that the method will or will not work in that domain. We conclude by describing the types of images and domains in which we can expect satisfactory results with this approach. Copyright 2005 ACM

    Semantic Annotation for Retrieval of Visual Resources

    Get PDF
    Beeldmateriaal speelt een steeds grotere rol in onze cultuur, maar ook in de wetenschap en in het onderwijs. Zoeken in grote collecties beeldmateriaal blijft echter een moeizaam proces. Het kost een eindgebruiker veel tijd en moeite om juist dat ene beeld te vinden. Daarom zijn er efficiënte zoekmethoden nodig om de groeiende collecties doorzoekbaar te maken en te houden. Laura Hollink onderzoekt de problemen bij het zoeken naar beeldmateriaal en de mogelijke oplossingen daarvoor, in drie uiteenlopende collecties: schilderijen, foto’s van organische cellen en nieuwsuitzendingen.Schreiber, A.T. [Promotor]Wielinga, B.J. [Promotor]Worring, M. [Copromotor

    Exploring concept representations for concept drift detection

    Get PDF
    We present an approach to estimating concept drift in online news. Our method is to construct temporal concept vectors from topicannotated news articles, and to correlate the distance between the temporal concept vectors with edits to the Wikipedia entries of the concepts. We find improvements in the correlation when we split the news articles based on the amount of articles mentioning a concept, instead of calendar-based units of time

    Learning Semantic Query Suggestions

    Get PDF

    A corpus of images and text in online news

    Get PDF
    In recent years, several datasets have been released that include images and text, giving impulse to new methods that combine natural language processing and computer vision. However, there is a need for datasets of images in their natural textual context. The ION corpus contains 300K news articles published between August 2014 - 2015 in five online newspapers from two countries. The 1-year coverage over multiple publishers ensures a broad scope in terms of topics, image quality and editorial viewpoints. The corpus consists of JSON-LD files with the following data about each article: the original URL of the article on the news publisher’s website, the date of publication, the headline of the article, the URL of the image displayed with the article (if any), and the caption of that image. Neither the article text nor the images themselves are included in the corpus. Instead, the images are distributed as high-dimensional feature vectors extracted from a Convolutional Neural Network, anticipating their use in computer vision tasks. The article text is represented as a list of automatically generated entity and topic annotations in the form of Wikipedia/DBpedia pages. This facilitates the selection of subsets of the corpus for separate analysis or evaluation

    Learning Semantic Query Suggestions

    Get PDF

    Bias in the analysis of multilingual legislative speech

    Get PDF
    In this paper we investigate the application of natural language processing tools to the multilingual proceedings of the European Parliament. This work is part of a study in which we explore (1) how subcorpora in different languages may lead to different conclusions about the political landscape, (2) how to determine what a potential language-related bias originates from, and (3) to what extent we can limit or even prevent an unwanted language-bias

    SWISH DataLab: A Web Interface for Data Exploration and Analysis

    Get PDF
    SWISH DataLab is a single integrated collaborative environment for data processing, exploration and analysis combining Prolog and R. The web interface makes it possible to share the data, the code of all processing steps and the results among researchers; and a versioning system facilitates reproducibility of the research at any chosen point. Using search logs from the National Library of the Netherlands combined with the collection content metadata, we demonstrate how to use SWISH DataLab for all stages of data analysis, using Prolog predicates, graph visualizations, and R

    Metadata categorization for identifying search patterns in a digital library

    Get PDF
    Purpose: For digital libraries, it is useful to understand how users search in a collection. Investigating search patterns can help them to improve the user interface, collection management and search algorithms. However, search patterns may vary widely in different parts of a collection. The purpose of this paper is to demonstrate how to identify these search patterns within a well-curated historical newspaper collection using the existing metadata.Design/methodology/approach: The authors analyzed search logs combined with metadata
    • …
    corecore